187 research outputs found
Analyzing Inexact Hypergradients for Bilevel Learning
Estimating hyperparameters has been a long-standing problem in machine
learning. We consider the case where the task at hand is modeled as the
solution to an optimization problem. Here the exact gradient with respect to
the hyperparameters cannot be feasibly computed and approximate strategies are
required. We introduce a unified framework for computing hypergradients that
generalizes existing methods based on the implicit function theorem and
automatic differentiation/backpropagation, showing that these two seemingly
disparate approaches are actually tightly connected. Our framework is extremely
flexible, allowing its subproblems to be solved with any suitable method, to
any degree of accuracy. We derive a priori and computable a posteriori error
bounds for all our methods, and numerically show that our a posteriori bounds
are usually more accurate. Our numerical results also show that, surprisingly,
for efficient bilevel optimization, the choice of hypergradient algorithm is at
least as important as the choice of lower-level solver.Comment: Accepted to IMA Journal of Applied Mathematic
On Optimal Regularization Parameters via Bilevel Learning
Variational regularization is commonly used to solve linear inverse problems,
and involves augmenting a data fidelity by a regularizer. The regularizer is
used to promote a priori information, and is weighted by a regularization
parameter. Selection of an appropriate regularization parameter is critical,
with various choices leading to very different reconstructions. Existing
strategies such as the discrepancy principle and L-curve can be used to
determine a suitable parameter value, but in recent years a supervised machine
learning approach called bilevel learning has been employed. Bilevel learning
is a powerful framework to determine optimal parameters, and involves solving a
nested optimisation problem. While previous strategies enjoy various
theoretical results, the well-posedness of bilevel learning in this setting is
still a developing field. One necessary property is positivity of the
determined regularization parameter. In this work, we provide a new condition
that better characterises positivity of optimal regularization parameters than
the existing theory. Numerical results verify and explore this new condition
for both small and large dimensional problems.Comment: 26 pages, 6 figure
A temporal multiscale approach for MR Fingerprinting
Quantitative MRI (qMRI) is becoming increasingly important for research and
clinical applications, however, state-of-the-art reconstruction methods for
qMRI are computationally prohibitive. We propose a temporal multiscale approach
to reduce computation times in qMRI. Instead of computing exact gradients of
the qMRI likelihood, we propose a novel approximation relying on the temporal
smoothness of the data. These gradients are then used in a coarse-to-fine (C2F)
approach, for example using coordinate descent. The C2F approach was also found
to improve the accuracy of solutions, compared to similar methods where no
multiscaling was used.Comment: 4 pages, 3 figures. Title revise
On the convergence and sampling of randomized primal-dual algorithms and their application to parallel MRI reconstruction
Stochastic Primal-Dual Hybrid Gradient (SPDHG) is an algorithm to efficiently
solve a wide class of nonsmooth large-scale optimization problems. In this
paper we contribute to its theoretical foundations and prove its almost sure
convergence for convex but neither necessarily strongly convex nor smooth
functionals. We also prove its convergence for any sampling. In addition, we
study SPDHG for parallel Magnetic Resonance Imaging reconstruction, where data
from different coils are randomly selected at each iteration. We apply SPDHG
using a wide range of random sampling methods and compare its performance
across a range of settings, including mini-batch size and step size parameters.
We show that the sampling can significantly affect the convergence speed of
SPDHG and for many cases an optimal sampling can be identified
- …